{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example for using Fasttext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "try:\n",
    "    import textaugment, gensim\n",
    "except ModuleNotFoundError:\n",
    "    !pip -q install textaugment gensim\n",
    "    import textaugment, gensim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load Fasttext Embeddings \n",
    "\n",
    "Fasttext has Pre-trained word vectors on English webcrawl and Wikipedia which you can find [here](https://fasttext.cc/docs/en/english-vectors.html) as well as Pre-trained models for 157 different languages which you can find [here](https://fasttext.cc/docs/en/crawl-vectors.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2020-09-01 10:11:28--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz\n",
      "Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n",
      "Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 4503593528 (4.2G) [application/octet-stream]\n",
      "Saving to: ‘cc.en.300.bin.gz’\n",
      "\n",
      "cc.en.300.bin.gz    100%[===================>]   4.19G  4.32MB/s    in 9m 57s  \n",
      "\n",
      "2020-09-01 10:21:26 (7.20 MB/s) - ‘cc.en.300.bin.gz’ saved [4503593528/4503593528]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Download the FastText embeddings in the language of your choice\n",
    "!wget \"https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save path to your pre-trained model\n",
    "from gensim.test.utils import datapath\n",
    "pretrained_path = datapath('./cc.en.300.bin.gz')\n",
    "\n",
    "# load model\n",
    "model = gensim.models.fasttext.load_facebook_model(pretrained_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from textaugment import Word2vec\n",
    "t = Word2vec(model = model.wv)\n",
    "output = t.augment('The stories are good')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cite the paper\n",
    "```\n",
    "@article{marivate2019improving,\n",
    "  title={Improving short text classification through global augmentation methods},\n",
    "  author={Marivate, Vukosi and Sefara, Tshephisho},\n",
    "  journal={arXiv preprint arXiv:1907.03752},\n",
    "  year={2019}\n",
    "}```\n",
    "\n",
    "https://arxiv.org/abs/1907.03752\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
